Statistical Encoding of Succinct Data Structures

نویسندگان

  • Rodrigo González
  • Gonzalo Navarro
چکیده

In recent work, Sadakane and Grossi [SODA 2006] introduced a scheme to represent any sequence S = s1s2 . . . sn, over an alphabet of size σ, using nHk(S) +O( n logσ n (k log σ + log log n)) bits of space, where Hk(S) is the k-th order empirical entropy of S. The representation permits extracting any substring of size Θ(log σ n) in constant time, and thus it completely replaces S under the RAM model. This is extremely important because it permits converting any succinct data structure requiring o(|S|) = o(n log σ) bits in addition to S, into another requiring nHk(S) + o(n log σ) (overall) for any k = o(logσ n). They achieve this result by using Ziv-Lempel compression, and conjecture that the result can in particular be useful to implement compressed full-text indexes. In this paper we extend their result, by obtaining the same space and time complexities using a simpler scheme based on statistical encoding. We show that the scheme supports appending symbols in constant amortized time. In addition, we prove some results on the applicability of the scheme for full-text self-indexing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Encoding Data Structures

In recent years, there has been an explosion of interest in succinct data structures, which store the given data in compact or compressed formats and answer queries on the data rapidly while it is still in its compressed format. Our focus in this talk is to introduce encoding data structures. Encoding data structures consider the data together with the queries and aim to store only as much info...

متن کامل

A Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees

We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the ADT. The main advantage of succinct indexes as o...

متن کامل

Succinct Indexes

This thesis defines and designs succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the ADT. As opposed to succinct (integrat...

متن کامل

A Framework of Dynamic Data Structures for String Processing

In this paper we present DYNAMIC, an open-source C++ library implementing dynamic compressed data structures for string manipulation. Our framework includes useful tools such as searchable partial sums, succinct/gap-encoded bitvectors, and entropy/run-length compressed strings and FM-indexes. We prove close-to-optimal theoretical bounds for the resources used by our structures, and show that ou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006